Importance analysis of psychosociological variables in frailty syndrome in heart failure patients using machine learning approach

The prevention and diagnosis of frailty syndrome (FS) in cardiac patients requires innovative systems to support medical personnel, patient adherence, and self-care behavior. To do so, modern medicine uses a supervised machine learning approach (ML) to study the psychosocial domains of frailty in cardiac patients with heart failure (HF). This study aimed to determine the absolute and relative diagnostic importance of the individual components of the Tilburg Frailty Indicator (TFI) questionnaire in patients with HF. An exploratory analysis was performed using machine learning algorithms and the permutation method to determine the absolute importance of frailty components in HF. Based on the TFI data, which contain physical and psychosocial components, machine learning models were built based on three algorithms: a decision tree, a random decision forest, and the AdaBoost Models classifier. The absolute weights were used to make pairwise comparisons between the variables and obtain relative diagnostic importance. The analysis of HF patients’ responses showed that the psychological variable TFI20 diagnosing low mood was more diagnostically important than the variables from the physical domain: lack of strength in the hands and physical fatigue. The psychological variable TFI21 linked with agitation and irritability was diagnostically more important than all three physical variables considered: walking difficulties, lack of hand strength, and physical fatigue. In the case of the two remaining variables from the psychological domain (TFI19, TFI22), and for all variables from the social domain, the results do not allow for the rejection of the null hypothesis. From a long-term perspective, the ML based frailty approach can support healthcare professionals, including psychologists and social workers, in drawing their attention to the non-physical origins of HF.

There is ample scientific evidence that psychosocial deficits coexist with abnormal aging (sometimes referred to as frailty syndrome [FS]), and even indications of their direct impact on this phenomenon [1][2][3] . Older people diagnosed with frailty are at risk of prolonged and repeated hospitalizations, which are often the result of neglect in the psychosocial sphere. Limited support of relatives, loneliness, and memory deficits affect the timeliness of visits to the doctor, signaling disturbing changes and compliance with medical recommendations (e.g., taking medications). Additionally, numerous scientific studies indicate that in patients diagnosed with FS, there www.nature.com/scientificreports/ Research instruments. The FS of the patients was assessed using the Tilburg Frailty Indicator (TFI) questionnaire. The TFI questionnaire consists of two parts: frailty determinants and frailty components. The first part contains sociodemographic data, while the second part covers physical, psychological, and social components 7 . The eight components of the TFI that belong to the physical domain are physical unhealthiness, unintentional weight loss, walking difficulties, difficulty in maintaining balance, poor hearing, poor vision, lack of strength in the hands, and physical tiredness (TFI11-TFI18). The psychological domain includes four components: problems with memory, a drop in mood, agitation and irritability, and inability to cope with problems (TFI19-TFI22). The social domain of the TFI contains three components: living alone, lack of social relations, and lack of social support (TFI23-TFI25) (see Table 1).
Exploratory analysis. The work analyzed questions from the TFI questionnaire (15 questions concerning physical and psychosocial components: TFI11-TFI25). Figure 1 below shows the average values of the positive responses and the value of the absolute difference between them. The absolute difference is necessary because of inverted questions whose values did not change during data preparation (TFI11, TFI22, and TFI25). Positive responses for physical (TFI11-TFI18), psychological (TFI19-TFI22), and social (TFI23-TFI25) were found to have disproportionate scores for both groups. Some of the positive answers had a high average in the group of people without frailty syndrome, and this value increased relatively slightly in the group of frail patients (e.g., TFI23 0.31-0.16 = 0.15; for comparison, TFI17 0.61-0.07 = 0.54). These results became a premise for further research related to the analysis of the diagnostic importance of individual TFI variables.
The results of the exploratory analysis, supported by pairwise comparison methods, were used to obtain answers to the research questions concerning the individual diagnostic criteria (variable importance) of TFI. It www.nature.com/scientificreports/ was limited to comparing psychosocial variables with three selected variables from the physical domain, which resulted in 21 comparisons.
To determine the absolute importance of the explanatory variables in the machine learning classification models, the permutation method was used 24 . The absolute weightings were then used to determine the relative diagnostic importance. To assess the statistical significance between the differences in the importance of the variables, post hoc multiple comparisons were made using a t-test. Statistical significance for relative values was determined using Bonferroni correction 25 .
The value of the z statistic allowed us to determine the significance of the selected variable (however, since the z statistic depends on the number of permutations, it is easier to follow the p value). In the case of the importance of absolute variables, a one-tailed test was used, whereas for comparisons between pairs of variables, a two-tailed test was performed to assess the absolute value of the z statistic. The interpretation of the sign indicated which of the compared variables had a greater diagnostic value. These statistics were translated into the p value.
Machine learning. For the purposes of machine learning, the sample was divided into subsets: training, validation, and testing. After missing data were discarded, the sample of 666 patients was limited to 612 patients (4/6 of these were used for the training procedure, 1/6 for validation, and the remaining 1/6 for the testing activities). Data division was performed using a randomizing function.
Based on the above theoretical assumptions and the purpose of the research, the following machine learning models were used, as implemented in the Scikit-learn statistical package (https:// scikit-learn. org): decision tree, random forest, and AdaBoost classifier. During the training phase, the models were trained on the training set (4/6 of the data) with different hyperparameters, and their performance was evaluated on the validation data (1/6 of the data). Hyperparameters of the models were selected based on the validation scores during this phase in such a way that they did not favor either true positive or true negative values (in other words, true positive and true negative values were treated on an equal footing). In the case of many possible configurations, the model with the lowest complexity was selected from the heatmap. Following these rules, a model with a data fraction per leaf parameter of 0.1 (minimum sample per leaf, calculated as a data fraction) and a depth of 3 (maximum depth) was selected for the decision tree algorithm. On the validation subset, this model achieved a detection accuracy level of 88%, both for people with a diagnosis of frailty and for those without a frailty diagnosis. Models with these parameters were then subjected to a test procedure.
For a random forest, a model with a data fraction per leaf parameter of 0.03 (minimum sample per leaf) and a depth of 4 (maximum depth) was selected. This model achieved, on the validation subset, the correct detection of people with FS at the level of 95% and people without an FS diagnosis at a similar level of 94%. The number of trees was arbitrarily set to 100.
For the AdaBoost classifier, a model with parameters of 200 (number of trees) and 1 (maximum depth) was selected. On the validation subset, this model achieved a detection accuracy level of 100%, both for people with FS and for those without FS. Consent to participate. Informed consent was obtained from all study participants and/or their legal guardians. Table 2 shows the characteristics of the sample. There was a large disproportion between the number of patients diagnosed with FS (N = 562) and those without an FS diagnosis (N = 104). The mean age was higher by approximately five years in patients with FS (73 years) compared to patients without FS (68 years). The comparison of the percentage of single people with a diagnosis of FS and without FS (47.2% and 63.5%, respectively) showed a clear disproportion between the two groups. In the group of people without FS, more patients exceeded the income equal to PLN 2101 (28.8%) than in the group of people diagnosed as frail (21.0%). Table 3 shows the true positive and true negative test values for the machine learning models built and verified in the previous phases (training and validation). The decision tree yielded 79% true positive and 87.5% true negative values; that is, the classification ability of this model was higher for people without FS than for people diagnosed with frailty. Learning with a random forest of decision trees led to a deterioration in the recognition of people with FS (93%) compared to those without FS (100%), but in the case of this model, the overall classification accuracy was much higher than that obtained by the decision tree. The AdaBoost algorithm was the only classifier that allowed better recognition of a person diagnosed with frailty (100%) than people without FS (93.8%), and it showed an overall classification accuracy similar to that of a random decision forest.

Results
Absolute diagnostic importance of TFI variables for the sample of patients with HF. In Fig. 2 and in the corresponding Fig. 3, which variables in particular models turned out to be important was assessed. It should be remembered that achieving p values higher than 0.05 does not necessarily imply that the model did not use the given variable. The diagnostic importance of the variables was measured by changes in the accuracy of the model, as described in the subsection on the assessment of the importance of the explanatory variables.
At the testing stage, the use of the decision tree resulted in the physical variable TFI17 achieving diagnostic significance (µ = 0.082, σ = 0.036, z = 230.1, p = 0.00), and the psychological variable TFI20 achieved diagnostic significance (µ = 0.062, σ = 0.025, z = 248.9, p = 0.00), while for the physical variable TFI13, the significance was µ = 0.014 (σ = 0.019, z = 74.1, p = 0.00). With the random decision forest, the following variables were of greatest www.nature.com/scientificreports/       www.nature.com/scientificreports/ Considering the classification accuracy of individual models, the AdaBoost and random forest algorithms had similar accuracies; thus, both models were selected ex aequo to verify the hypotheses, whereas the decision tree model was rejected due to lower accuracy and is presented for illustrative purposes only. Figure 4 shows the relative diagnostic importance of variables in the decision tree model based on the test subset. All TFI variables used to build this model were important in the analysis of absolute values; therefore, all possible pairs of variables were compared. TFI17 was more important than TFI20 (Δ = 0.02, z = 46, p ≤ 0.0167) and TFI13 (Δ = 0.07, z = 169, p ≤ 0.0167). The relative difference between TFI20 and TFI13 was Δ = 0.05 (z = 154, p ≤ 0.0167). Based on the results obtained with the decision tree model, the psychological variable TFI20 ("Have you experienced a drop in mood over the last month?") was indicated as the second most important factor for www.nature.com/scientificreports/ the correct classification of people with FS. However, as many as two of the three variables used in the model belonged to the physical components of the TFI questionnaire. The relative diagnostic importance of the variables for the test subset with a random forest is presented in Fig. 5. The differences between the importance of the variables for the AdaBoost algorithm on the test subset are presented in Fig. 6. Based on the hypotheses put forth in the study regarding the comparison of physical components with psychosocial components, comparisons between selected physical components (TFI13, TFI17, and TFI18) and psychosocial components (TFI19-TFI25) were critical.
The diagnostic validity of the psychological variable TFI19 (memory problems) obtained using a random forest was significantly higher than that of TFI13 (walking difficulties; Δ = 0.004, z = 25, p ≤ 0.0005), lower than TFI17 (lack of hand strength; Δ = 0.001, z = − 6, p ≤ 0.0005), and lower than TFI18 (physical fatigue; Δ = 0.002, z = − 10, p ≤ 0.0005). In the case of the AdaBoost classifier, TFI19 did not participate in the construction of the model, indicating that for a specific subset of data, this variable did not contribute; that is, it did not affect the classification accuracy of FS. The consequence of this is that each of the other variables was diagnostically more important than TFI19. As a result of the above analysis, there is insufficient evidence to conclude that the psychological variable TFI19 is diagnostically more important than TFI13 (walking difficulties) or less important than TFI17 (lack of hand strength) and TFI18 (physical fatigue).
The diagnostic validity of the next psychological variable TFI20 (Have you experienced a drop in mood during the last month?) obtained with the use of a random forest was significantly greater than TFI13 (walking difficulties, Δ = 0.029, z = 134, p ≤ 0.0005), TFI17 variable (lack of hand strength, Δ = 0.024, z = 97, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.023, z = 105, p ≤ 0.0005). The AdaBoost classifier showed that the diagnostic importance of TFI20 was statistically significantly lower than that of TFI13 (walking difficulties, Δ = 0.001, z = − 5, p ≤ 0.0005), but higher than TFI17 (lower hand strength, Δ = 0.007, z = 44, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.010, z = 65, p ≤ 0.0005). Thus, there is insufficient evidence to conclude that the psychological variable TFI20 is diagnostically more important than TFI13 (walking difficulties). However, both algorithms unequivocally confirmed that the variable TFI20 is diagnostically more important than TFI17 and TFI18: lack of hand strength and physical fatigue, respectively.
The importance of the sociological variable TFI23 (Do you live alone?) with a random decision forest turned out to be diagnostically greater than that of TFI13 (walking difficulties, Δ = 0.010, z = 68, p ≤ 0.0005), TFI17 (lack of hand strength, Δ = 0.005, z = 26, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.004, z = 28, p ≤ 0.0005). The AdaBoost classifier showed that the diagnostic importance of the sociological variable TFI23 was much lower than that of TFI13 (walking difficulties, Δ = 0.025, z = − 165, p ≤ 0.0005), TFI17 (lack of hand strength, Δ = 0.017, z = − 133, p ≤ 0.0005,) and TFI18 (physical fatigue, Δ = 0.014, z = − 119, p ≤ 0.0005). In the case of the AdaBoost classifier, there was a similarity between the comparative results of TFI22 and TFI23 against all other variables. However, for TFI23, based on the random forest and AdaBoost results, it is impossible to unequivocally determine the relative diagnostic importance of this variable over the three physical variables: walking difficulties, lack of hand strength, and physical fatigue.
For the sociological variable TFI24 (Do you ever miss the company of other people?), its diagnostic importance in a random forest was significantly lower than that of TFI13 (walking difficulties, Δ = 0.001, z = − 7, p ≤ 0.0005), TFI17 (lower hand strength, Δ = 0.006, z = − 30, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.007, z = − 38, p ≤ 0.0005). The AdaBoost classifier showed that the diagnostic importance of TFI24 was statistically significantly lower than that of TFI13 (walking difficulties, Δ = 0.003, z = − 15, p ≤ 0.0005), but greater than that of TFI17 (lower hand strength, Δ = 0.006, z = 35, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.009, z = 58, p ≤ 0.0005). Thus, it is concluded that the sociological variable TFI24 is less important than the variable walking difficulties. There was no clear evidence to confirm the hypothesis that TFI24 is more important than the physical variables: lack of strength in the hands and physical fatigue.
The importance of the third (and last) sociological variable, TFI25 (Do you receive a lot of support from others?), in a random decision forest was significantly less diagnostically important than TFI13 (walking difficulties, Δ = 0.002, z = − 12, p ≤ 0.0005), TFI17 (lower hand strength, Δ = 0.007, z = − 36, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.007, z = − 48, p ≤ 0.0005). The AdaBoost classifier showed that the diagnostic importance of TFI24 was less diagnostically important than that of TFI13 (walking difficulties, Δ = 0.009, z = − 52, p ≤ 0.0005), TFI17 (lower hand strength, Δ = 0.001, z = − 7, p ≤ 0.0005) and TFI18 (physical fatigue, Δ = 0.002, z = 13, p ≤ 0.0005). The sociological variable TFI25 was found to be less important than the physical variable linked to walking difficulties. www.nature.com/scientificreports/ There was no clear evidence to confirm the hypothesis that the variable TFI25 is more important than the other two variables from the physical domain: lack of strength in the hands and physical fatigue. It is noteworthy that the difference between determining the importance of variables by the AdaBoost and random forest models, which, despite similar accuracy results, did not assign similar importance to all variables (see Fig. 3). This may be because the random decision forest achieved better results for truly negative values (i.e., in detecting persons without FS) and the AdaBoost classifier for true positive values (detecting frail persons), that is, differences in the algorithms themselves.

Discussion
The results of the machine learning models showed that some of the psychological questions from the TFI questionnaire had very high absolute diagnostic validity. In the case of relative diagnostic importance, when checking whether the variables from the psychological and social components were more important than the physical components, the most interesting for us were pairs of variables consisting of one question belonging to the physical component and one question belonging to the psychosocial components. Questions that did not differ significantly from each other indicated non-significant differences between these variables, and the variables that differed by very small values indicated very similar importance.
To verify the hypotheses, the results of the two best algorithms were considered-the random forest and the AdaBoost classifier. A pairwise comparison of components from the physical domain with psychosocial components was made for three selected components characteristic of FS: TFI13, TFI17, and TFI18. The psychological variable TFI20 was found to be more diagnostically important than the other variables from the physical domain: TFI17 and TFI18. Further, the psychological variable TFI21, which diagnosed the patient's irritability, turned out to be diagnostically more important than all three physical variables considered: TFI13, TFI17, and TFI18. In the case of the other two variables from the psychological domain (TFI19 and TFI22) as well as all variables from the social domain, the obtained results do not allow for the rejection of the null hypothesis. This could be because these variables were less diagnostically important in relation to the three variables analyzed from the physical domain, or because the two best algorithms assigned different diagnostic importance to these variables.
The analysis of the results concerning psychological variables showed that a feeling of irritability (nervousness or excitement) over the last month was the most important diagnostic question among all psychosocial TFI questions. This highlights the problem of stress in people with FS among the population of patients with HF. By contrast, a drop in mood over the last month, which was more diagnostically important than TFI17 and TFI18, may indicate a risk of depression in people with FS in the cardiology group studied. The results show the functioning deficits of the population of people with FS and are consistent with the results of the current research.
It is interesting that variable TFI19, diagnosing memory problems characteristic of old age, was not recognized by any of the models as one of the most important variables, and was not used at all two of them-the decision tree and the AdaBoost classifier. Moreover, TFI22, the ability to deal with problems, although used to construct the two best machine learning models, was not more diagnostically important than any of the three physical components analyzed.
Social variables did not gain a clear diagnostic advantage over variables from the physical domain. However, TFI24, diagnosing the need for relationships with other people, was one of the most important diagnostically important variables in the AdaBoost classifier. The ambiguity of the results obtained for this variable indicates the need for a deeper analysis of the phenomenon of longing related to relationships with other people. The reverse situation occurred in the case of TFI23 because it was considered one of the most important diagnostically in the random decision forest algorithm but not by the AdaBoost algorithm. In addition, in this case, research on the importance of patients living alone should be intensified. The phenomenon of loneliness is currently a subject of interest in research on FS 26 .
Notably, the differences between the validation and test results between the same models may indicate differences in subsets that could not be avoided, even with the randomization process. Although all data should be analyzed, test models are the most important for model evaluation and drawing conclusions. The differences between the validation and the test usually indicate the existence of the model overfitting effect; thus, future research should minimize these differences using measures that overcome this problem. One such strategy would be to use more data and repeat the training, validation, and testing processes. Another option is to use an already existing model from the validation and then run only the test process on the new data, which is less time consuming but does not guarantee an improved model (if it is obtainable). The process of dividing the variables into training, validation, and test subsets could also be repeated to ensure that these subsets differ as little as possible.
The main aim of this work was to develop a pilot study using existing frailty testing tools to find the most diagnostically important variables of this syndrome in the population of patients with HF. In the long term, this knowledge could be used by doctors, psychologists, and social workers, drawing attention to the biggest problems in the non-physical sphere of patients. Another motivation for the study was the possibility of developing an improved frailty testing tool adapted to the symptoms of patients with HF. We should not exclude the possibility that for a specific group of patients-for example, cardiac patients with FS-there are some relationships between variables that are less noticeable for the general population of patients diagnosed with frailty. Analyses of this type were not, however, the aim of this study, as for this purpose, it would be necessary to consider additional patient populations without HF.
Identifying the psychosocial factors related to the adverse condition of frail patients with HF is crucial for improving care quality and reducing costs. Existing psychosocial factor identification approaches are primarily based on guidelines or observational data on frailty. Our ML-based frailty algorithm as a possible module of the healthcare analytics system can identify psychosocial variables that are not in the existing known factors. www.nature.com/scientificreports/ is required in real-time by asking medical staff questions to the patients they care for. For instance, our study showed that experiencing feelings of irritability (nervousness or excitement) over the past month proved to be the most diagnostically important question of all the TFI psychosocial items. This result draws attention to the problem of stress in people with heart failure. On the other hand, the decrease in mood over the last month, which is diagnostically more important than physical components of FS (the TFI17 and TFI18), may indicate the risk of depression in this cardiac patient population. The proposed machine learning approach can infer such implicit information on the psychological condition of HF patients and extract relevant information for recommending psychological interventions.
Another important premise of this work was the analysis of the risk of an incorrect diagnosis based on incomplete information about the patient. In the era of research focused on large datasets, despite efficient diagnostic systems, medical and social services must make immediate decisions based on limited, incomplete patient data. This is especially true of semi-open diagnostic systems, where a diagnosis can be given with the likelihood of the diagnosis being correct, depending on the data, the quantity of which may be limited, or when creating traditional, abbreviated versions of the questionnaires. Thus, specialists could receive valuable hints regarding what information or variables can be replaced with other information or variables in the model. This work is a contribution to future research, allowing us to verify the impact of individual variables on the appearance of FS and its health over many years, which may become possible with increasingly developed panel databases. It should also be emphasized that although this work did not attempt to build a model for patients with pre-frail syndromes, the approach presented in this paper shows how subtle differences in multivariate models can be captured. Such an approach will be used in the near future for preventive purposes in the process of holistic treatment. Psychologists could be involved in this process, inter alia, thereby gaining valuable knowledge about the patient population and the areas of necessary influence.

Conclusion
This study analyzed the diagnostic validity of the psychosocial and physical criteria used in the diagnostic process of frail patients in an elderly population suffering from HF. Following the AI approach and the TFI survey data, including the physical, psychological, and social domains, three machine learning models were constructed based on separate algorithms of the decision tree, random decision forest, and AdaBoost classifier. The findings show that none of the variables belonging to the social domain were more diagnostically important than the variables from the physical domains linked to experiencing difficulties due to walking difficulties, lack of hand strength, and physical fatigue. In the case of psychological criteria for FS, the factors of irritability were diagnostically more important than all three analyzed physical variables, while the variable of depression mood was diagnostically more important than the variables related to physical weakness, that is, lack of hand strength and physical fatigue.

Limitation
The study has some limitations. Firstly, since the effects of psychological and sociological deficits (coexistence of phenomena) are shown, the ambiguity of the influence and coexistence of psychosocial factors on frailty is possible. Therefore, there may be unclear conclusions about the impact of psychosocial factors and their dependencies on frailty. This, in turn, might implicate suggest unknown causal dependencies between disease and psychosocial deficits. Psychological and social deficits are part of the frailty model. As these deficits are included in the operational definition of frailty, the Tilburg Frailty Indicator (TFI) embraces such measures. According to the conceptual model of frailty 27 , multidimensional frailty along with physical, psychological, and social components should be considered a consequence of individual diseases and multimorbidity. Furthermore, the use of machine learning does not solve the problem of causal inference in observational datasets. Algorithms can be useful in predicting outcomes, but predictors per se do not explain causality. The second limitation were input hypotheses. The number of the hypotheses was limited to comparing psychosocial variables with three selected variables from the physical domain, which resulted in obtaining 21 comparisons. This selected input hypotheses were driven by clinical background of patients with heart failure. Obviously, further study may be unrestricted and embrace more hypothetical directions.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.